How Pairwise Coevolutionary Models Capture the Collective Residue Variability in Proteins?

نویسندگان

  • Matteo Figliuzzi
  • Pierre Barrat-Charlaix
  • Martin Weigt
چکیده

Global coevolutionary models of homologous protein families, as constructed by direct coupling analysis (DCA), have recently gained popularity in particular due to their capacity to accurately predict residue-residue contacts from sequence information alone, and thereby to facilitate tertiary and quaternary protein structure prediction. More recently, they have also been used to predict fitness effects of amino-acid substitutions in proteins, and to predict evolutionary conserved protein-protein interactions. These models are based on two currently unjustified hypotheses: 1) correlations in the amino-acid usage of different positions are resulting collectively from networks of direct couplings; and 2) pairwise couplings are sufficient to capture the amino-acid variability. Here, we propose a highly precise inference scheme based on Boltzmann-machine learning, which allows us to systematically address these hypotheses. We show how correlations are built up in a highly collective way by a large number of coupling paths, which are based on the proteins three-dimensional structure. We further find that pairwise coevolutionary models capture the collective residue variability across homologous proteins even for quantities which are not imposed by the inference procedure, like three-residue correlations, the clustered structure of protein families in sequence space or the sequence distances between homologs. These findings strongly suggest that pairwise coevolutionary models are actually sufficient to accurately capture the residue variability in homologous protein families.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Maximum entropy models for antibody diversity.

Recognition of pathogens relies on families of proteins showing great diversity. Here we construct maximum entropy models of the sequence repertoire, building on recent experiments that provide a nearly exhaustive sampling of the IgM sequences in zebrafish. These models are based solely on pairwise correlations between residue positions but correctly capture the higher order statistical propert...

متن کامل

A New Framework for Analysis of Coevolutionary Systems - Directed Graph Representation and Random Walks.

Studying coevolutionary systems in the context of simplified models (i.e. games with pairwise interactions between coevolving solutions modelled as self plays) re-mains an open challenge since the rich underlying structures associated with pairwise-comparison-based fitness measures are often not taken fully into account. Although cyclic dynamics have been demonstrated in several contexts (such ...

متن کامل

Amino acid positions subject to multiple coevolutionary constraints can be robustly identified by their eigenvector network centrality scores.

As proteins evolve, amino acid positions key to protein structure or function are subject to mutational constraints. These positions can be detected by analyzing sequence families for amino acid conservation or for coevolution between pairs of positions. Coevolutionary scores are usually rank-ordered and thresholded to reveal the top pairwise scores, but they also can be treated as weighted net...

متن کامل

Protein sites with more coevolutionary connections tend to evolve slower , while more variable protein families acquire higher coevolutionary connections

Correlated mutation or coevolution of positions in a protein is Background: tightly linked with the protein’s respective evolutionary rate. It is essential to investigate the intricate relationship between the extent of coevolution and the evolutionary variability exerted at individual protein sites, as well as the whole protein. In this study, we have used a reliable set of coevolutionary Meth...

متن کامل

Deep generative models of genetic variation capture mutation effects

The functions of proteins and RNAs are determined by a myriad of interactions between their constituent residues, but most quantitative models of how molecular phenotype depends on genotype must approximate this by simple additive effects. While recent models have relaxed this constraint to also account for pairwise interactions, these approaches do not provide a tractable path towards modeling...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Molecular biology and evolution

دوره 35 4  شماره 

صفحات  -

تاریخ انتشار 2017